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Predicting Neighbor Distribution in Heterogeneous Information Networks 
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Abstract 

Recently, considerable attention has been devoted to the 
prediction problems arising from heterogeneous information 
networks. In this paper, we present a new prediction task, 
Neighbor Distribution Prediction (NDP), which aims at 
predicting the distribution of the labels on neighbors of a 
given node and is valuable for many different applications in 
heterogeneous information networks. The challenges of NDP 
mainly come from three aspects: the infinity of the state 
space of a neighbor distribution, the sparsity of available 
data, and how to fairly evaluate the predictions. To address 
these challenges, we first propose an Evolution Factor Model 
(EFM) for NDP, which utilizes two new structures proposed 
in this paper, i.e. Neighbor Distribution Vector (NDV) to 
represent the state of a given node’s neighbors, and Neighbor 
Label Evolution Matrix (NLEM) to capture the dynamics of 
a neighbor distribution, respectively. We further propose a 
learning algorithm for Evolution Factor Model. To overcome 
the problem of data sparsity, the learning algorithm hrst 
clusters all the nodes and learns an NLEM for each cluster 
instead of for each node. For fairly evaluating the predicting 
results, we propose a new metric: Virtual Accuracy (VA), 
which takes into consideration both the absolute accuracy 
and the predictability of a node. Extensive experiments 
conducted on three real datasets from different domains 
validate the effectiveness of our proposed model EFM and 
metric VA. 

1 Introduction 

As part of the recent surge of research on information 
networks, considerable attention has been devoted to 
prediction problems in heterogeneous information net¬ 
works. The existing researches, however, mainly focus 
just on the predictions around a single link. For exam- 
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pie, some works are interested in predicting whether or 
when a link will be built in the future giEiiziiiaiiiin], 
and some works concern predicting strength of a link, 
such as predicting the ratings that customers will give to 
items or locations [SJIII11112]. Existing researches sur¬ 
prisingly pay little attention to the prediction of neigh¬ 
bor distributions, where states of neighbors are consid¬ 
ered as a whole. 
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Figure 1: Neighbors Distribution 


Fig. [Hoffers an illustration of neighbor distribution. 
The left part of Fig. [T] illustrates User A has two movie 
nodes as its neighbors, which means User A rented two 
movies. The neighbor distribution of node User A is the 
distribution of labels on its neighbor nodes, as shown by 
the right part of Fig. [U where the neighbors of a given 
node have equal weight, and the weight of a neighbor 
node is uniformly divided by its labels. 

The neighbor distribution of a node usually evolves 
over time. For example, a user might rent movies of 
different genres as his/her taste changes. Such evolution 
makes the prediction of neighbor distributions valuable 
for many different applications. 

Motivating Example For an online sports video 
provider, the type distribution of subscribers is crucial 
to develop its sales strategy. The provider may be 
misled, if it only takes the recent sales data into 
consideration, and ignores the evolutionary feature of 
the type distribution. For example, the soccer fans are 
the major subscribers on May 2014, which may lead 
the provider to put more soccer advertisements online. 
However, the soccer fans are increasing slowly on May, 
and become the major subscribers on June 2014, as the 
opening of four-yearly soccer celebration "World Cup”. 
Traditional recommender system methods may ignore 
the tiny increase of soccer fans on May. 




In this paper, we aim at the problem of predict¬ 
ing the neighbor distribution of a given node in a het¬ 
erogeneous information network, which has three main 
challenges we have to overcome: 

• Infinite state space of neighbor distribu¬ 
tions Since the fraction of a label is a real value, the 
number of possible states of a neighbor distribution is 
theoretically infinite. Traditional temporal models such 
as Markov chain cannot serve our goal because they of¬ 
ten assume a finite state space. 

• Sparsity of heterogeneous links In most cases, 
the links between one specific node and its heteroge¬ 
neous neighbors are relatively sparse compared with 
the huge volume of a whole data set, e.g., ’’publish¬ 
ing” in DBLP, ’’rating” in Netflix and ’’checking in” 
in Foursquare. The sparsity of links between heteroge¬ 
neous nodes makes it harder to mine sufficient mean¬ 
ingful patterns for individuals. 

• Fairly evaluating predictions Not all nodes are 
equally predictable, hence the traditional metrics that 
just take absolute accuracy into account are unable to 
appropriately assess the predictions for the nodes that 
are less predictable. We need a new metric that can 
treat every node fairly. 

In this paper, inspired by the idea of Factor Model 
HZIIIH], we propose an Evolution Factor Model (EFM) 
to accurately predict neighbor distributions from the 
sparse data. Our main contributions can be summarized 
as follows: 

(1) We introduce an Evolution Factor Model (EFM) for 
accurately predicting neighbor distributions. EFM 
employs our proposed data structures, Neighbor 
Distribution Vector (NDV) and Neighbor Label 
Evolution Matrix (NLEM), to represent the infinite 
state space of a neighbor distribution and capture 
the evolution of neighbor distributions respectively. 

(2) We propose a new prediction metric, Virtual Ac¬ 
curacy (VA), which takes into consideration both 
the absolute accuracy and the difficulty of a pre¬ 
diction to fairly evaluate the prediction results of 
nodes with different predictabilities. 

(3) We conduct extensive experiments on three real 
datasets, and compare EFM with an empirical 
method and two existing methods. The results 
validate the performance of our proposed model, 
algorithm and accuracy metric. 

The rest of this paper is organized as follows. 
We give the problem definition and formalization in 
Section 2. In Section 3, we describe our prediction 
model EFM, and further present the learning algorithm 
for EFM. We discuss the predictability of nodes and 


propose a prediction metric in Section 4. We present the 
experimental results and analysis in Section 5. Finally, 
we discuss related works in Section 6, and conclude in 
Section 7. 

2 Problem Definition 

2.1 Heterogeneous Information Network A het¬ 
erogeneous information network contains multiple types 
of objects and links. In this paper, we only consider 
those heterogeneous information networks with star net¬ 
work schema [13], i.e., links only exist between the cen¬ 
ter type of nodes as target nodes, and several other 
types of nodes as attribute nodes. For example, in 
Location Based Social Network, the target nodes are 
users, and the attribute nodes can be venues, ratings or 
tips. 

We denote the heterogeneous information networks 
with star network schema by G = {V,E), where V is 
the node set and E is the link set. We denote the 
target and attribute node set by A C V and U C V 
respectively. The nodes and the links in networks are 
being constructed and destructed over time. In order 
to capture the dynamics, we use time window, which 
is denoted by T, to capture the neighbor distribution 
with timeliness from dynamic networks. The node set 
and the attribute node set in T are denoted by Vr and 
Ut- We use Th, and T/ to represent the historical, 
current and future time windows respectively. 

2.2 Label Distribution Vector 

Assuming the universal label set of a given attribute 
node set U is denoted by (3u = ...,...,|, 

where is a label, and n is the number of label types, 
the definition of label distribution vector is given as 
follow: 

Definition 2.1. Label Distribution Vector 
(LDV) For a given attribute node u £ K, its LDV, 
itu G M", is defined as: 

(2.1) T^„ = (uW,...,z;«,...,Ti")), 

where n is the number of all attribute nodes’ label types; 

luiPu) is 1 if w has the label 

2^fc=i -'wiPu '' 

and 0, otherwise. 

LDV measures the label distribution of an attribute 
node, which is fixed. For example, the labels of an 
article are subject areas, thus the LDV depicts the 
direction of this article. 

2.3 Neighbor Distribution Vector 



Definition 2.2. Neighbor Distribution Vector 
(NDV) For a given target node x € X, the NDV of 
x’s attribute node neighbors C Ut in time window 
T, ^ is defined as: 


the transformation of its historical NDV, 

<u'rj, Mx > 

through its NLEM Lx ^ ° , which leads to our EFM 

as follows: 


(2.2) = 


where n is the number of the given attribute nodes’ label 
E„eu' 


, (i) 

types; Wx = 


\UL\-\-n 


-,i e [l,n]. 


Hereinafter, we just denote a NDV by L^x if the 
context is unambiguous. Note that: (1) For smoothing, 
we add 1 in the numerator and n in the denominator 
of Wx'^. (2) A node has an NDV corresponding to each 
different type of attribute node neighbors. For example, 
in DEEP, there are two type of attribute nodes, which 
are ’’Articles” and ’’Journals”; therefore, the target 
node ’’Scholar” should have two NDVs, one for its 
’’Article” neighbors and the other for its ’’Journal” 
neighbors. 


(3.4) 
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2.4 Problem Statement 

Based on NDV, we can formally state the problem of 
Neighbor Distribution Prediction (NDP) as follow : 

Assigned the historical, current and future time 
window T/j, Tc and Tf respectively, given a target node 
X and its NDV of x’s attribute node neighbors ZJ in time 
window T/i and Tc, we want to predict l^xi^Tf)- 

3 Evolution Factor Model 

In this section, we describe our Evolution Factor Model 
(EFM). At first, we briefly introduce the basic idea of 
Factor Model. 

3.1 Evolution Factor Model 

Recommender system methods based on latent factor 
matrix model take the data in historical and current 
time window as a whole, but dismiss the evolution of the 
network. In order to capture the dynamics, Evolution 
Factor Model first stores the probability of changes 
from one label to another, which leads to the following 
definition of Neighbor Label Evolution Matrix (NLEM). 

Definition 3.1. Neighbor Label Evolution Ma¬ 
trix For a given node x, its Neighbor Label Evolution 
Matrix of attribute nodes lA from time window Tp to Tq, 
<uL > 

denoted hy Lx ’’ '' G is a matrix in which a 

cell Lx{i,j) is the probability that x’s neighbor label 
changes from to i.e., 

(3.3) T,(*,j) = ^^(XlX^)- 

Evolution Factor Model Based on NLEM, for a 
given target node x, we can predict its NDV ) as 


Eigure 2: An example of Evolution Eactor Model 

Fig- m gives an example that shows how EEM 
works. In Fig. [21 there are four labels representing 
different research directions: DB (Data Base), ML (Ma¬ 
chine Learning), DM (Data Mining), IP (Image Process¬ 
ing). A cell {i,j) of NLEM represents the probability 
that scholars change their research directions from i to 
j. By given an NLEM learned from historical data, and 
the current NDVs of Scholar A and Scholar R, we can 
infer their next NDVs by transforming their historical 
NDVs by the learned NLEM. 

Note that the essence of the matrix product in EEM 
is different from that in a general factor model. In a 
general factor model, the matrix product is static, which 
consider the data in historical and current time window 
as a whole. In contrast, NLEM, in our EEM, which 
can capture the changes from historical time window to 
current time window agilely, and changes as the given 
target node’s neighbor labels evolve over time. 

3.2 Model Learning 

To overcome the issue of data sparsity, in the light of the 
heuristic knowledge that similar individuals have similar 
behaviors, we first apply a clustering algorithm to all the 
target nodes, and learn the NLEM for the cluster which 
X belongs to. According to this idea, given target node 
X and the node set X' consisting of the nodes belonging 
to the same cluster of x, we can learn NLEM as follow: 

(3.5) Tx = argrain ^ e^, 

^ x'&X' 



























where e = \lix'{p(T^) ~ ^a:'(^Te)l> ^ ^ 

"^x'i^Ta^ = -Zj X l^x'iJ^Th)- 

Note that estimate of x'ij^rj for 

target node x'. So NLEM is actually defined as the 
optimal matrix that minimizes the overall error of the 
estimates over all target nodes in X. Thus, for learning 
NLEM, we adopt least square method to establish linear 
regression, indicating the NDVs in Th and as follows: 


the similarities are measured by Euclidean distances 
between NDVs. The selection of K is discussed in 
Section V. 

3.3 Prediction 

Given a target node x, its NDV x^x(plTh+Tj 

learned NLEM Lx , the prediction of w x {^Tj ) 

can be made by EFM(Equation (13.41) 1. 


(3.6) 

where 




X = 


r = 


V W = (W' j 

Y(^'> = l^xmil^T). 


The learning algorithm for EFM is shown in Algorithm 

1 . 


Algorithm 1 Learning Algorithm for EFM 
{x,Th,Tc,XT^,XT,,U,K) 

INPUT: 

x: A given node; 

Th'. Assigned historical time window; 

Tc'. Assigned current time window; 

: A subset of X in time window Th ; 

Xt^ : A subset of X in time window T^', 

U: The given attribute node set; 

K'. The parameter of A-means; 

OUTPUT: 

<U'x M'x > 

Lx ‘ The NLEM needed to be learned; 

1: A' = $, A = 0, y = 0, W = {x' I x' e At„, AtJ; 

2: for each x' GW do 

3: Compute l^x’ihiy) and l^x'ipiy)', 

4: end for 

5: Do A—means on W based on the similarities be¬ 
tween T^x'il^y)', 

6: X' = {the nodes of the cluster that x belongs to}; 

7: for i = 1 to I A' I do 
8 : 

9: 

10: end for 

11: Compute Lx ° according to Equation (13.6p . 
where inverse matrix is computed by Gauss Jordan 
method; 


4 Prediction Metric 

4.1 Normalized Absolute Accuracy 

We can intuitively measure the absolute accuracy of pre¬ 
dictions for the given node x in terms of the Euclidean 
distance between a true and its estimate x- 

By Definition 2, components of an NDV are positive 
and the sum of them is equal to 1, so the Euclidean 
distance between any two NDVs is less than or equal 
to v^. Then we can define the normalised absolute 
accuracy as follow: 

(4.7) = 

where dfi^x, '^x) is the Euclidean distance between l^x 
and l^x- 

4.2 Predictability 

As we have mentioned, it is unfair to assess a prediction 
just in terms of absolute accuracy, since the predictabil¬ 
ity of nodes are different. 

Intuitively, the predictability of a node is relevant 
to its susceptibility to the similar homogeneous nodes. 
Specifically, the easier the node can be influenced by 
others, the more disordered its temporal pattern is, and 
the greater its predictability is. For example, in a given 
research field, the leading scholars’ directions are dif¬ 
ficult to capture, because they change their research 
directions rarely and such changes are mainly break¬ 
throughs. These changes can hardly be predicted com¬ 
pared with their long-term stable studies. In contrast, 
the research direction of a PhD candidate is more likely 
influenced by his/her supervisor or the leading scholars. 
Inspired by this observation, we can define Prediction 
Difficulty as the measure on how difficult to predict a 
given node’s NDV. 

Definition 4.1. Prediction Difficulty (PD) For a 
node X, the prediction difficulty of its NDV of attribute 
node neighbors Vlf in time window T, denoted by 
gx{hi!p) is defined as: 


In our learning algorithm, any classical clustering 
algorithm is qualified for our learning algorithm. We 
choose A-means as the clustering algorithm, where 


gx{uy = 1 - hx{uyi2, 

where hx{U'jf is the temporal entropy of x’s NDV in 








time window T, and 

hxiU'rp) = - Wx\ll!j.)lognWx\Wrp). 

Note that, G (0,1), so hx(U'j.) > 0, and 
when wO) = l/n,Vi G [l,n], hxiU'j') reaches the 
maximum, which equals — Thus, 

hx{U!j-) G (0,1] and gxil^'x) £ [1/2,1)- In Definition 
4, we use temporal entropy hxiU'j^) to measure how 
disordered a node’s temporal pattern is. We can 
see the more disordered the temporal pattern, the 
greater the temporal entropy, and consequently the 
less the prediction difficulty, which is in line with our 
expectation. 

4.3 Virtual Accuracy 

Now we further define Virtual Accuracy based on abso¬ 
lute accuracy and prediction difficulty as follow: 

Definition 4.2. Virtual Accuracy (VA) For a pre¬ 
diction of nt'a,(W,T/), its Virtual Accuracy, denoted by 
5x, is defined as: 

(4.8) 8x — gx ^ gx^ 

where gx is the absolute accuracy and gx is the predic¬ 
tion difficulty. 

As Equation (14.811 shows, we define VA of a predic¬ 
tion as the product of the absolute accuracy and the 
predictability of that prediction. Since gx and gx are 
both nonnegative, it is obvious that VA favors the pre¬ 
dictions whose absolute accuracy and difficulty are both 
great. As we can see in later experiments, gx is nega¬ 
tively correlated with gx- Thus even the absolute accu¬ 
racy of a difficult prediction is low, the VA of it can still 
be expected be not low since its prediction difficulty is 
large. On the other hand, even the absolute accuracy 
of an easy prediction is high, the VA of it is expected to 
be low due to its small prediction difficulty. 

5 Experimental Evaluation 
5.1 Datasets 

We learn the NLEM from the NDVs in Th and T(.. For 
predicting neighbor distribution, we take the NDVs in 
Tfi + Tc as training set, and the NDVs in T/ as test set. 

The datasets we use to validate our model and 
algorithm are from three different domains, DBLP (a 
Coauthor Network), Netflix (a Movie Rental Network), 
and Foursquare (a Location Based Social Network). 
The summary of datasets is shown in Table [I] 

DBLP [H] indexes more than about 230 million 
articles and contains massive links to home pages of 
computer scientists. The labels of ’’Article” contain 
25 directions on Computer Science, thus an NDV of 


a ’’Scholar” node consists of 25 components. By as¬ 
signed historical, current and future time window Th = 
[2006,2010), Tc = [2010,2011) and Tf = [2011,2012], 
we randomly select 1000 scholars who published arti¬ 
cles in all the three time window. 

Netflix [1] contains about more than 100 million 
rating records from about 480,000 customers over about 
17,000 movie titles. The labels of ’’Movie” contain 
28 genres crawled from the website IMDb [5], thus 
an NDV of a ’’User” node consists of 28 components. 
By assigned historical, current and future time window 
Th = [Apr.l2*'‘ 2005, Oct.l2*'‘ 2005), T^ = [Oct.12*'* 
2005, Nov.l2*'‘ 2005) and Tf = [Nov.l2‘^ 2005, Dec.l2*'‘ 
2005], we randomly select 1,000 users who have movie 
rating records in all the three time window. 

Foursquare[2 involves about 4.3 million friend¬ 
ships and about 80, 000 check-in tips of users during 
3 years. The labels of ’’Venue” contain 8 categories 
given by Foursquare. By assigned historical, current 
and future time window Th = [0 day, 966 day), Tc 
= [966*^ day, 996*^ day) and Tf — [996*^ day, 1026*^ 
day], we randomly select 500 users who have check-in 
records during in all the three time window. 

5.2 Baseline 

In order to demonstrate the effectiveness of our EFM, 
we compare our method with the following baseline 
methods: 

• MVM (Mean Value Method) MVM is an empirical 
method. It takes the mean of the latest NDVs of the 
nodes in the cluster that the predicted node belongs 
to, as the estimate of the next NDV of a given node. 

• ME (Basic Matrix Factorization) [17] MF is proposed 
by B. Webb to solve the movie recommender problem 
in Netflix Price. MF assumes the features of objects 
can be expressed as a series of factors, and different 
types of objects have factors with the same amount. 
When predicting the preference of the given objects 
of type A for the objects of type B, the preferences 
(which is called ’’ratings” in many cases) can be 
expressed as the product of the factors of the given 
objects of type A and B. The general expression of 
factor model is: 

(5.9) R = PQ^, 

where P G is the factor matrix of the objects 

of type A. Q G is the factor matrix of objects 

of type B. N and M are the number of the objects 
of type A and the number of the objects of type B, 
respectively. D is the factor number. 


Dataset 

Network 

The type of 
predicted nodes 

The type of 

predicted node’s neighbors 

^Neighbor’s lables 

^Predicted Nodes 

DBLP 

heterogeneous 

author 

paper 

25 

1000 

Netflix 

heterogeneous 

user 

movie 

28 

1000 

Foursquare 

heterogeneous 

user 

venue 

8 

500 


Table 1: Summary of Datasets. 


• BiasedMF (Biased Matrix Factorization) [9] Bi- 
asedMF is proposed by Paterek, which is an exten¬ 
sion of Basic Matrix Factorization. BiasedMF adds 
biased rates to the objects of either type. The pre¬ 
diction formula is: 

(5.10) T Pu^k ' Qm,k^ 

where fu,m is an estimate of rate that the object u 
of type A gives to the object m of type B. The Pu,k 
and qm,k are the cells of the factor matrixes of type 
A and type B respectively, and bm are the biases 
of object u and m respectively. 

In our experiments, we set the parameters of MF 
and BiasedMF as learning rate 77 = 0.001 and punishing 
parameter A = 0.02, as suggested by Paterek [5] and 
Gorrell et al. [3]. We choose the number of NDV 
components as the latent feature numbers in MF and 
BiasedMF. Thus the feature numbers of DBLP, Netfilix 
and Foursquare are 25, 28 and 8 respectively. 

5.3 The Determination of K 

Learning Algorithm for EFM requires the number of 
clusters, K, as the input when it invokes a AT-means 
procedure, so we have to determine K before we start 
our experiments. For each dataset, we first randomly 
select 500 nodes from it, then apply our model to 
make predictions for these nodes and choose the K 
that maximizes the average absolute accuracy of the 
predictions. As Fig. [3] shows, we finally get K = 5 
for DBLP, AT = 1 for Netflix, and AT = 155,156 for 
Foursquare during [2 : 00,3 : 00] and [11 : 00,12 : 00] 
respectively. 




(a) DBLP and Netflix Foursquare 

Figure 3: The Selection of K 


5.4 The Validation of Predictability 

Now we investigate how the absolute accuracy of a pre¬ 
diction correlates with its prediction difficulty. For each 
dataset, we first rank the nodes by PD in descending or¬ 
der, then divide the nodes into five groups. The nodes 
in a same group have equal PD. Finally we observe the 
absolute accuracies by applying EFM and three baseline 
methods, MVM, MF and BiasedMF, on the five groups 
respectively. 

The results are shown in Fig. IT] We can see 
that the absolute accuracies of the methods we use in 
the experiments decrease in overall with the increase 
of the prediction difficulty. Such result validates our 
assumption that the more disordered the temporal 
pattern of a node is, the greater its predictability is. 
It also shows the necessity to assess a model by a 
fair metric which should take the predictability into 
consideration. 

Note that, on Foursquare, the absolute accuracies of 
baseline methods do not decrease linearly with PD. It is 
because the human’s daily routines are not all the same, 
which leads to the fluctuation of absolute accuracies 
of baseline methods. The absolute accuracy of EFM, 
however, has a linear decrease with PD. It is because 
EFM is not limited to the recognition of daily pattern, 
but instead takes the evolution regularity (represented 
by the neighbor label evolution matrix in EFM) into 
consideration. For example, the office workers who 
like nightlife can go to the nightclub for sleepover only 
on Weekends. For the nodes of that type, the two 
empirical methods can not perform as expected because 
the activities of sleepover on weekends are not common 
to everyone (which is the reason why MVM’s curve 
fluctuates), or to an individual on everyday (which is the 
reason why the curve of MF and BiasedMF fluctuate). 

5.5 The Comparison between EFM and Base¬ 
line Methods 

In this part, we compare the virtual accuracy of EFM 
with three baseline methods: MVM, MF and BiasedMF. 
The summarized result is shown in Fig. [5] and the de¬ 
tailed result is listed in Table [2] Our remarks on the 
result are as follows: 
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Figure 4: The Relationship between Absolute Accuracy 
and Prediction Difficulty on Three Datasets 


which performs similarly to MVM on DBLP and 
Netflix, but far worse on Foursquare. 

(4) EFM outperforms MVM, MF and BiasedMF es¬ 
pecially on Foursquare. This is because the three 
baseline methods only pay attention to the daily 
pattern (MVM) or the profile of users (MF and Bi¬ 
asedMF). However, the activities in Foursquare are 
limited to not only the daily pattern or profile of 
users, but also the evolution regularity over weeks, 
even months. 

In summary, EFM is a robust and effective method. 

The VA of EFM is generally better than all the baseline 

methods. 

6 Related Work 
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Figure 5: The Comparison of VA between EFM and 
Baseline Methods 


Method 

DBLP 

Netflix 

Foursquare 
[2:00, 3:00] 

Foursquare 
[11:00, 12:00] 

EFM 

0.5049 

0.4960 

0.5722 

0.5606 

MVM 

0.4359 

0.4216 

0.5179 

0.5032 

MF 

0.4937 

0.4917 

0.3793 

0.3619 

BiasedMF 

0.4360 

0.4217 

0.4381 

0.4852 


Table 2: Virtual Accuracies of EFM, MVM, MF and 

BiasedMF 

(1) EFM performs far better than MVM on all the 
datasets, while MVM has the worst performance 
on the Netflix and DBLP datasets. 

(2) Although having the worst performance on the 
two Foursquare datasets, MF does have a good 
performance on Netflix dataset comparing with 
the other baseline methods, since MF is originally 
proposed for the movie recommendation problem 
in Netflix. EFM, however, still performs better 
than MF, which is because EFM can take into 
consideration not only the profile of the predicted 
nodes, but also the evolution regularity. 

(3) As shown in Fig. [5l EFM outperforms BiasedMF, 


Three domains are relevant to our work, namely link 
prediction, rating prediction and factor model. 

Link Prediction: Hasan et al. [1] first introduces 
supervised learning to predict whether a link will be 
built in the future. Wang et al. m introduces 
probabilistic model for link prediction. Leroy et al. 
[6] solves the cold start problem in link prediction. 
Lichtenwalter et al. [7] proposes new perspectives and 
methods in link prediction. Taskar et al. m propose 
a method to address the problem of link prediction in 
heterogeneous networks, based on the observations of 
the attributes of the objects. Sun et al. [12] extends 
the traditional link prediction to relationship prediction, 
which not only predicts whether it will happen, but also 
infers when it will happen. However, Sun’s work, still 
focuses on the predictions of a single link. 

Rating Prediction (Recommender System): 
Basically, the methods predicting ratings of links fall 
into two categories: memory-based algorithms and 
model-based algorithms. The memory-based algorithms 
directly make predictions based on homogeneous neigh¬ 
bors of a given node [8] [101 HI] : while model-based al¬ 
gorithms make predictions based on a prediction model 
learned in advance. Savia et al. m proposes a predic¬ 
tion model based on bayesian networks. These existing 
methods pay insufficient attention to the evolution of 
neighbor distributions. 

Factor Model: Factor Model assumes the features 
of objects can be expressed as a series of factors, which 
is also called Matrix Factorization (MF). MF is first 
proposed by Webb [17] to solve the movie recommender 
problem in Netflix Price. Based on Webb’s work, 
G. Gorrell et al.[3] optimize the learning rate and 
the punishing parameter in MF. Paterek [9] proposes 
Biased Matrix Factorization (BiasedMF) to improve 
the performance of MF. However, the existing methods 





























can not capture the dynamics of neighbor distributions 
agilely, which is exactly why we propose a new model 
EFM for our goal. 

7 Conclusion 

In this paper, we present a new prediction problem, 
Neighbor Distribution Prediction in heterogeneous in¬ 
formation network. To address this problem, we pro¬ 
pose an Evolution Factor Model (EFM), which takes 
Neighbor Label Evolution Matrix (NLEM) as the dy¬ 
namic factor, and predicts the next NDV of a given 
node by transforming its current NDV by the NLEM. 
We also propose a learning algorithm for EFM, which 
learns the NLEM from the homogeneous nodes which 
are in the same cluster as a given node. 

For fairly evaluating the predictions made by differ¬ 
ent methods, we propose Virtual Accuracy, which not 
only measures the absolute accuracy, but also takes the 
difficulty of a prediction into consideration. 

We conduct the experiments on the datasets from 
three different applications, and compare EFM with 
three baseline methods: Mean Value Method, Basic 
Matrix Factorization and Biased Matrix Factorization. 
The results show EFM outperforms all the baseline 
methods in overall. 
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